Discovering Features Towards Recognising Textual Entailment
نویسنده
چکیده
Recognising Textual Entailment (RTE) has the objective of judging whether one piece of short text, normally a single sentence, follows from another piece of longer text. Such a task has been the focus of three consecutive challenges since 2005. Current approaches have explored the more sophisticated deep Natural Language Processing (NLP) techniques. However, the capability of shallow features are yet to be fully exploited. Furthermore, since the data for the task are extracted from different NLP categories, namely Question Answering (QA), Text Summarisation (SUM), Information Retrieval (IR) and Information Extraction (IE), further data/feature analysis can be done on these categories individually to search for differences between these categories in terms of Recognising Textual Entailment. This thesis sets to analyse the RTE datasets for features that can be used in classification, to examine the effectiveness of using only shallow NLP techniques and to investigate whether data from different NLP categories possess different characteristics. The practicality of the shallow features and the performance of the RTE system are analysed in several experiments. The results of the experiments indicate that shallow features are candidates for feature-based RTE systems though further improvements require the extraction of deeper features. Moreover, a system that uses a different set of features on each individual NLP category has the potential of beating a single classifier that is trained on all the training data and all the features in accuracy. Future work include improving the current matching and alignment techniques, better relation detection and adding in temporal normalisation.
منابع مشابه
A Lexical Alignment Model for Probabilistic Textual Entailment
This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Challenge. The paper proposes first a general probabilistic setting that formalizes the notion of textual entailment. We then describe a concrete alignment-based model for lexical entailment, which utilizes web co-occurrence statistics in a bag of words representation. Finally, we report the results of ...
متن کاملTextual Entailment Recognition Using a Linguistically-Motivated Decision Tree Classifier
In this paper we present a classifier for Recognising Textual Entailment (RTE) and Semantic Equivalence. We evaluate the performance of this classifier using an evaluation framework provided by the PASCAL RTE Challenge Workshop. Sentence–pairs are represented as a set of features, which are used by our decision tree classifier to determine if an entailment relationship exisits between each sent...
متن کاملTowards an Entity-based Recognition of Textual Entailment
This paper describes the experiments developed and the results obtained in the participation of UNED in the Fourth Recognising Textual Entailment (RTE) Challenge. This year we decided to change the scope of our work with the aim of beginning to develop a system that performs a deeper analysis than the techniques used in the last editions. This participation has been the first step in the develo...
متن کاملTALP at TAC 2008: A Semantic Approach to Recognizing Textual Entailment
This paper describes our experiments on Textual Entailment in the context of the Fourth Recognising Textual Entailment (RTE-4) Evaluation Challenge at TAC 2008 contest. Our system uses a Machine Learning approach with AdaBoost to deal with the RTE challenge. We perform a lexical, syntactic, and semantic analysis of the entailment pairs. From this information we compute a set of semantic-based d...
متن کاملExploiting Lexical Measures and a Semantic LR to Tackle Textual Entailment in Italian
This paper discusses the participation of the University of Alicante and the Istituto di Linguistica Computazionale in the textual entailment exercise at EVALITA 2009. We present a system based on our previous experiences on the RTE Challenges. The system uses a machine learning classifier fed by features derived from lexical distances, part-ofspeech information and semantic knowledge from SIMP...
متن کامل